Shi et al. BMC Genomics 2013, 14:103 
http://www.biomedcentral.com/1471 -21 64/1 4/1 03 



Genomics 



RESEARCH ARTICLE Open Access 



Pause-melting misalignment: a novel model for 
the birth and motif indel of tandem repeats in 
the mitochondrial genome 

Wei Shi 1 , Xiao-Yu Kong 1 ", Zhong-Ming Wang 1 , Shan-Shan Yu 1 , Hai-Xia Chen 2 and Elizabeth A De Stasio 3 



Abstract 

Background: Tandem repeats (TRs) in the mitochondrial (mt) genome control region have been documented in a 
wide variety of vertebrate species. The mechanism by which repeated tracts originate and undergo duplication and 
deletion, however, remains unclear. 

Results: We analyzed DNA sequences of mt genome TRs (mtTRs) in the ridged-eye flounder {Pleuronichthys 
cornutus), and characterized DNA sequences of mtTRs from other vertebrates using the data available in GenBank. 
Tandem repeats are concentrated in the control regions; however, we found approximately 16.6% of the TRs 
elsewhere in the mt genome. The flounder mtTRs possess three motif types with hypervariable characteristics at 
the 3' end of the control region (CR). 

Conclusion: Based on our analysis of this larger dataset of mtTR sequences, we propose a novel model of Pause 
Melting Misalignment (PMM) to describe the birth and motif indel of tandem repeats. PMM is activated during a 
pause event in mitochondrial replication in which a dynamic competition between the nascent (N) heavy strand 
and the displaced (D) heavy strand may lead to the melting of the N-strand from the template CD light strand. 
When mispairing occurs during rebinding of the N-strand, one or several motifs can be inserted or deleted in both 
strands during the next round of mt-replication or repair. This model can explain the characteristics of TRs in 
available vertebrate mt genomes. 



Background 

The mitochondrial (mt) genome appears to evolve faster 
than the single-copy fraction of the nuclear genome by 
approximately a factor of ten [1]. The portion of the mt 
genome that evolves most rapidly is the control region 
(CR), or the D-loop region, presumably due to lower 
selection pressure [1-3]. Within the control region of 
vertebrate mt genomes, tandem repeats (TRs) located at 
the CR termini (the 3'- or 5'- end) evolve most rapidly, 
with extensive variations in motif size and copy number 
resulting from motif insertions and deletions (indels) in 
TR tracts [4-7]. 

To date, several hypotheses have been proposed to ex- 
plain the birth and indel of mt genome tandem repeats 
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(mtTRs). Two related theories explaining the mechanism 
by which motif indels are created in nuclear genomes 
have been applied to similar events in mt genomes. In 
one such theory, recombination, which has been 
confirmed in mt genomes [5,8-11], is responsible. In this 
case, double strand breakage and rejoining of DNA lead 
to the gain or loss of mtTR motifs. An alternative hy- 
pothesis, the Slipped-Strand Mispairing (SSM) model 
[12], not only accounts for microsatellite indels [13-15], 
but also for motif indels in mtTRs [6,16,17]. In its sim- 
plest form, the SSM model involves local denaturation 
and displacement of the strands of a DNA duplex, 
followed by mispairing of complementary bases at the 
site of an existing similar sequences or short tandem 
repeat. When followed by replication or repair, misalign- 
ment could lead to indels of one or several simple 
repeats. 

A few models have been proposed to explain particular 
indel events in mt genomes [18-21], and subsequently 
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have been adopted in some other cases [5,22-24]. For ex- 
ample, the Illegitimate Elongation Model (IEM) 
suggested by Buroker [18] involves a competitive equi- 
librium between the nascent (N) heavy strand and the 
displaced (D) heavy strand while the N-strand is arrested 
at a termination-associated sequence (TAS) in the 5' end 
of the CR. This model suggests that frequent misalign- 
ment occurs prior to elongation in the repeat region and 
is facilitated by a stable secondary structure in the 
D-strand. This mechanism could result in both length 
and sequence heteroplasmy of TRs. Another influential 
model, the Improper Initiation Model (IIM), was 
proposed by Broughton and Dowling [19]. In this 
process, mt-replication begins at the upstream tRNA Phe 
gene, an improper origin of mt-replication. If mt- 
replication is completed at the regular termination site, 
the sequence between tRNA Phe and the normal origin of 
replication will be repeated on the nascent heavy strand. 
Under these circumstances, the repeated sequence will 
always be identical in length and position. Other 
mechanisms used to explain the origin of special motif 
indels include that of Taylor and Breden [20], who 
present a model for mini-satellite birth from non- 
contiguous repeats in the mt genome of the guppy, 
Poecilia reticulate, and that of Hayasaka [21], used to 
explain the motif indels of mtTRs involving conserved 
sequence blocks 2 and 3 (CSB2, CSB3) in the CR of the 
Japanese monkey, Macaca fuscata. 

Each of these models possesses certain limitations. Previ- 
ous studies have provided some evidence of recombination 
in mt genomes, but the incidence of mitochondrial DNA 
recombination is thought to be rare [7,25-27]. In fact, 
indels of repeat regions in mtDNA are too abundant to be 
explained solely by the recombination model. A prediction 
of the SSM model, due to its intra-helical nature, is that 
there should be an appreciable bias toward the duplication 
of short simple repeat units; therefore, the likelihood of a 
tandem repeat forming would decrease rapidly as the 
length of the repeat motif increases [20]. Consequently, this 
model is unlikely to explain the higher proportion of long 
motif repeats. Both IEM and IIM best explain the forma- 
tion of motif indels only in a particular position of the CR. 

How do tandem repeats form, and subsequently insert 
and delete in mt genomes, and why do they have high 
sequence and length heteroplasmy even within an indi- 
vidual? In order to address these questions, we analyzed 
the sequences of highly variable mtTRs in the ridged-eye 
flounder (Pleuronichthys cornutus), and explored the 
characteristics of vertebrate mtTRs using data from 
GenBank. We then developed a novel model for the 
birth and motif indel of TRs, in the hope of further 
understanding the evolutionary mechanism of mtTRs 
variability, and the evolution of mt genomes more 
generally. 



Methods 

The amplification of R cornutus CRs 

We used 20 individuals of P. cornutus, thirteen from 
Dongshan, East China Sea and seven from Qingdao, Yel- 
low Sea. A portion of the epaxial musculature was excised 
from each fresh specimen, and immediately stored at 
-70°C. Total genomic DNA was extracted using an 
SQ Tissue DNA Kit (OMEGA, China) following the 
manufacturers protocol. Two primer pairs (L-DL: ACTC 
CCAAAGCCAGGATTCT, H-DL: GAGGGTGAGGTT 
TAACGGGGG; L-AP1: GCCTGTAGCTTTTTAGGTAT, 
H-AP2: AAGCATAACACTGAAGATG) were designed 
for amplification of P. cornutus control region sequences. 
The PCR was performed in a 25 ul reaction volume 
containing 2.0 mM MgCl 2 , 0.4 mM dNTP, 0.5 uM of each 
primer, 1.0 U Taq polymerase (Takara, Dalian, China), 
2.5 ul lOx Taq buffer, and approximately 50 ng DNA 
template. PCR cycling conditions included an initial 
denaturation at 95°C for 3 min, 35 cycles of a denaturation 
at 94°C for 45 s, an annealing step at 48°C for 45 s, and 
elongation at 72°C for 3 min with a final extension at 72°C 
for 5 min. The PCR products were detected in 1.0% 
agarose gels and purified using a Takara Agarose Gel DNA 
Purification Kit (Takara, China). Purified products were 
inserted into the PMD18-T vector (Takara, China), and 
then transformed into E. coli competent cells. The final 
cloned products were sequenced in both directions using 
the ABI 3730 genetic Analyzer. Primers used in sequencing 
were the same as those used for PCR, and new primers 
were designed for walking sequencing. The sequenced 
fragments were assembled using DAMBE v 5 [28] and 
BioEdit v 7.0.1 [29]. 

GenBank database and TRs analysis 

The sequences of 1726 vertebrate mt genomes were 
retrieved from the NCBI genome database in March, 
2012 (http://www.ncbi.nlm.nih.gov/genome). Repeated 
sequences were identified using Tandem Repeats 
Finder v4.03 [30] with the following parameters: 
match +2, mismatch -7, indel -7. Only those repeats 
whose scores exceeded 50 were reported. Looser or 
stricter parameters were also tested in the same soft- 
ware packages, but these resulted in more parado- 
xical repeats and the loss of some repeats from the 
dataset. 

The location of TRs in mt genomes from GenBank 
was manually checked, and TRs located in the CRs 
were classified by their location relative to the 3' or 
5' end. Those TRs without a clear location in the CR 
were not used in the statistical analysis. Utilizing 
SPSS software, the distributions of motif lengths in 
the 3' and 5' ends and in protein coding genes were 
analyzed. 
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Results and discussion 

The configurations of P. cornutus mtTRs 

A total of 109 CR fragments, including complete repeat 
tracts, were obtained by PCR amplification and cloning 
from 20 P. cornutus individuals (GenBank accession nos. 
JX457484-JX457592). The results show that all 109 CR 
fragments have repeat tracts with two or three motif 
types at their 3' ends. The three motif types are 
designated: 5'-TR, M-TR, and 3'-TR based on their rela- 
tive position in the CRs, with the 5'-TR near CSB3, the 
3'-TR near tRNA Phe , and the M-TR between the 5'-TR 
and 3'-TR (Figure 1). The 5'- and 3'-TR motifs were 
detected in all 20 individuals while only two individuals 
(QD-07 and DS-03) contained the M-TR repeat The 
motif sequence of the 5'-TR is ATATTACA and that of 
the M-TR motifs are shown in Table 1. The motif se- 
quence of the 3'-TR is TTTAATGT; those 3'-TR motifs 
that varied by a single base are 3'-TR-vl (TGTAATGT) 
and 3'-TR-v2 (TATAATGT; Figures 1 and 2). 

Hypervariable characteristics of B'-terminal repeats 

Comparisons of 109 3'-TRs showed that the length and 
the arrangement of 3'-TR and 3'-TR-v differ consider- 
ably not only between individuals, but also in different 
clones from single individuals (Figure 2). Extensive 
length polymorphism in these tracts is caused by 
differing motif copy number, ranging from four (DS-04, 
32 bp) to 51 copies (QD-07, 408 bp) between 
individuals, and from four (DS-04, 32 bp) to 26 copies 
(DS-04, 208 bp) within individuals (Figure 2). A variable 
number of 3'-TR-v motifs are scattered throughout the 
3'-TR tracts. The intra- individual differences are as 
follows: (i) One motif (3'-TR-vl) in DS-05(4), and two 
motifs (3'-TR-vl and 3'-TR) in DS-11(3) are inserted, (ii) 
From one to six motifs are deleted at DS-13(6), QD-06 
(100), QD-07. (iii) While we can be sure that the 39 th 
copy of the motif is deleted in QD-07(8, 16, and 174), 
site of deletion within other clones cannot be easily 
identified. For example, a deletion of the 3'-TR-v2 in 
QD-07 (6) could have occurred at any site from the 5 th 
to the 29 th copy of the motif (see Figure 2). 

The variation of copy number and motif organization 
between individuals QD-02, QD-03, and QD-04 is 



Table 1 The sequences of M-TR motif in 20 individuals of 
P. cornutus 



Individuals 


Sequences 


18 specimens 


mATAACACTAmATCAAAATACTCAAAm 
GTGGTGCCCAGGATAmAGAACACmAATG* 


DS-03 


(TTTAAAACACTAmATCAAAATACTCAAAm 
GTGGTGCCCAGGATAmAGAACAC) 2 TTTAATG 


QD-07 


mGT(AACACTAmATCAAAATACTCAAAm 
GTGGTGCCCAGGATAmAG) 4 , 5 AACACmAATG 


*The sequences in 18 specimens are not repeated. Bold font: The bases at the 
5'end of the repeat and bases identifying the 3' flanking region. 



negligible as the difference of copy number is no more 
than one copy. TR diversity is seen between individuals, 
particularly in comparison to QD-07, as the motif 
organization and repeat length in this individual are en- 
tirely different, and the sequence of the 3'-TR-vl motif 
TGTAATGT, is changed to 3 / -TR-v2 (TATAATGT). The 
motif organization within a TR in individuals from diffe- 
rent locations is not correlated with geographic location, 
indicating that TR organization is not geographically 
partitioned. For example, the TR organization of indi- 
viduals QD-02, 03, 04 and DS-06, and 10 differ by a few 
3'-TR motifs, but the TR organization of most other 
individuals vary to a great extent (Figure 2) 

We suggest that: (i) There is remarkable sequence het- 
erogeneity of 3'-TR tracts not only between individuals 
but also within individuals, (ii) Indel events occur ran- 
domly rather than at fixed sites within TR tracts, (iii) 
The indel unit of mtTRs could be one or multiple cop- 
ies, suggesting that the indel tempo of mtTRs can be 
greatly increased by indel events of multiple motifs, (iv) 
There are two single base variants of S'-TR^-TR-vl and 
3'-TR-v2, and the arrangement of 3'-TR and 3'-TR-v is 
highly diverse. 

Hypervariable features cannot be explained by available 
models 

These data suggest that the low frequency of recombi- 
nation events could not lead to such high levels of mtTR 
variation in both organization and length, thus it is 
impossible to explain the hypervariable features of 
P. cornutus TRs by the recombination model alone. Al- 
though the SSM model could explain such frequent 



Control Region (CR) 




3'-TR-v 

Tandem repeated tracts 

Figure 1 An illustration of the location of three motif types in P. cornutus control region. CytB: Cytochrome b gene; Thr, Pro, Phe: tRNA of 
Threonine, Proline, Phenylalanine; 12S: Small subunit ribosomal DNA. 
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3'-TR motif: | TTTAATGT 
3'-TR-v1 motif: [] TGTAATGT 
3'-TR-v2 motif: [] TATAATGT 
# unfinished sequences 



Figure 2 Comparisons of 3'-TR tracts based on 109 mtTR fragments from P. cornutus. "QD" indicates the samples from Qingdao of Yellow 
Sea, and "DS" from Dongshan of East China Sea. The first digit indicates sample number while the bracketed digit is clone number. 



variation, this model more easily explains the formation 
of shorter simple motifs (microsatellites) than the forma- 
tion of longer repeated motifs [12], If SSM is responsible 
for the mtTR variability in P. cornutus, simple shorter 
motifs would appear in higher abundance. For example, 
the repeated motif of 3'-TR-v2 (TA) 2 ATGT would in- 
clude simple repeats, such as (TA) 3> 4> ATGT rather 
than (TATAATGT) 2 , 3,. . . alone. However, the fact is that 



no such simple repeats were found in any of the 109 TR 
fragments suggests that the SSM is not the most suitable 
model to account for the hypervariable features in P. 
cornutus mtTRs. Similarly, it seems the IEM model can- 
not explains our data as this mechanism only takes ef- 
fect at the 5' end of the CR, while we observed repeated 
tracts at the 3' end of the CR in this flatfish. Another 
mitochondrial model, IIM, would yield identical repeated 
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arrays in length and location from tRNA e to the origin 
of heavy strand replication, and this feature is not seen 
in P. cornutus mtTRs. Furthermore, the probability of 
false initiation of mt-replication from tRNA Phe , a condi- 
tion for the IIM model, is not sufficiently high to explain 
the extremely hypervariable features in P. cornutus 
mtTRs. The model described by Hayasaka et al [21] is 
suitable for the formation of TRs surrounding the CSB 
region, not a feature of the mtTRs in P. cornutus. 

Repeated tracts in the mt genomes of other species 
such as dogs, wolves, and halibut [5,31] share similar 
hypervariable features to those of P. cornutus. Thus far, 
no mechanistic model can explain the origin of the 
observed variation in these species. 

The characteristics of tandem repeats in other vertebrate 
mt genomes 

In order to explore the characteristics of tandem repeats 
in other vertebrate mt genomes, we retrieved 1,726 
complete vertebrate mt genomes from in five taxonomic 
groups (Pisces, Amphibia, Reptilia, Aves and Mammalia) 
from the NCBI mt genome database. This data set 
includes 2,111 repeated tracts from 853 mt genomes 
(Table 2 and Additional file 1: Table SI). The repeated 
tracts in each mt genome were located. A total of 1,760 
repeat tracts (83.4%) are concentrated mainly in the con- 
trol regions, ranging from 65.4% in Amphibians to 
90.5% in Reptiles, including 929 TRs at the 3' end and 
591 TRs at the 5' end (we were not able to determine 
the relative positions of the remaining 240 TRs). These 
TRs exhibit considerable variation in copy number and 
motif length, with the former ranging from 2 to 158 co- 
pies, and the latter from 1 bp to 478 bp. Unexpectedly, 
16.6% of the TRs are located outside the control regions, 
either entirely or partially in coding regions (179 TRs), 
rRNA genes (45 TRs), tRNA genes (40 TRs) and inter- 
genic regions (112 TRs). The length of most TRs in co- 
ding regions is less than 50 bp and the copy number 
ranges from two to four. This is the first report of so 
many repeated arrays located outside the control region 
(Additional file 1: Table SI). 

The distribution of motif lengths of 2,111 mtTRs ty- 
pically ranges between 6-22 bp (Figure 3a). The quantity 



of even-numbered motif lengths (2, 4, 6, 8 and 10 bp) is 
apparently more than those with odd numbers (3, 5, 7, 
9 bp and 11 bp), a feature that has also been reported in 
nuclear genomes [32]. Because mtTRs are primarily 
located in the 3' or 5' ends of the CR and coding 
sequences, the motif length distributions at those 
locations were further analyzed, revealing length diver- 
sity (Figure 3b,c,d). 

The TR motifs at the 3' end are typically of two sizes: 
2-4 bp and 6-12 bp (Figure 3b); those at 5' end are typ- 
ically much longer (17 to 19 bp; Figure 3c). Further ana- 
lyses show that most motifs at the 5' end (>60%) contain 
TAS blocks, which are not observed in TRs at the 3' 
end. The motif length in coding sequences is usually 
12-18 bp, and tends to vary in multiples of three bases 
(12, 15, 18; Figure 3d). It is assumed that a leading rea- 
son for the distribution of 17-19 bp motifs at the 5' end 
is due to the selection pressure on TASs, because the 
TAS length is approximately 15 bp [33], while selection 
pressure to maintain reading frame leads to a three-base 
variation in coding sequences based on the triplet code. 

It is generally believed that the indel of short motifs 
(microsatellites) can be explained by the SSM model 
[13,14]. The 1-4 bp repeats located at the CR 3'end 
(Figure 3b) may also be explained by this mechanism. 
There are, however, far more mtTRs with 6-22 bp 
motifs than those with shorter ones (Figure 3a). It is dif- 
ficult to explain the generation of these larger motifs by 
the SSM model alone because the incidence of SSM 
decreases rapidly with increasing motif length. Addition- 
ally if long-motif mtTRs are generated by recombination, 
as happens in nuclear TRs [12], the low probability of 
mt recombination is not likely to generate the large 
number of mtTRs observed. Therefore, a new model of 
TR motif generation is needed. 

A novel model of Pause-Melting Misalignment (PMM) 

Currently, no model can satisfactorily explain the vari- 
ability of motif indels in mtTRs, particularly outside the 
CR region; the uneven distribution of motif sizes; and 
the concentration of mtTRs at both termini in the CR. A 
novel model of Pause Melting Misalignment (PMM) is 
proposed. 



Table 2 Characteristics of repeated tracts based in vertebrate mt genomes from NCBI 

Categories Species Species with TRs Total TRs At CRs CR/Total At 3'* At 5'* 37CR 

Pisces 932 313 610 490 80.33% 156 316 33.05% 

Amphibia 110 86 240 157 65.42% 59 64 47.97% 

Reptilia 172 126 526 476 90.49% 290 84 77.54% 

Aves 155 98 229 197 86.03% 144 28 83.72% 

Mammalia 357 230 506 440 86.96% 280 99 73.88% 

Total 1726 853 2111 1760 83.37% 929 591 61.12% 
* 3' or 5' means 3' or 5' end in the CRs. 
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Figure 3 The motif length distribution of mtTR tracts, a. total 21 1 1 mtTRs; b. mtTRs at the 3' end in the CRs; c. mtTRs at the 5' end in the 

CRs; d. mtTRs of CDSs. 



The process of PMM is activated by a pause event dur- 
ing mt-replication (Figure 4a). During this pause event, 
dynamic competition between the N-strand and the 
D-strand can lead to the melting of the N-strand from 
the T-strand (Figure 4b). As a result, the N-strand is par- 
tially displaced by the D-strand. When the N-strand suc- 
cessfully rebinds to the T-strand, mispairing between 
complementary or near-complementary sequences 



would occur easily due to possible single strand folding 
(Figure 4c). When next round of mt-replication or repair 
takes place, insertion or deletion of one or several motifs 
could occur (Figure 4d). If the self-complementary pro- 
trusion resides in the N-strand (Figure 4c), after the next 
round of mt-duplication, a repeat segment would be 
added (Figure 4d). If the protrusion resides in the 
T-strand (Figure 4c), the N-strand would delete a repeat 
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5' motif 1 motif2 motif3 motif 4 T-strand 3' 




5' motif 1 motif2 motif2 motif3 motif 4 3' 5' motif2 motif3 motif 4 3' 



Figure 4 The formation of a motif indel via the PMM mechanism, a. the pause stage of mt-replication; b. competition and melting of the N- 

strand and D-strand; c. misalignment and protrusion; d. a repeated motif is inserted or deleted. 
I J 



segment after the next round of mt-replication 
(Figure 4d). 

The pause event of mitochondrial replication 

During mt-replication, the pause event is the trigger for 
PMM. A pause event of mt-replication at the 5' end in 
the CRs is likely since elongation of N-strands is arrested 
at TAS blocks at the 5' end [33,34]. Similarly, we predict 
frequent pause events at the 3' end in the CRs, because 
the replication initiation and closure domain of N- 
strands exist at this domain, many replication events 
such as segregation of N-strand molecules and super- 
helical turns [35] occur while the replication forks are 
traversing this section. These cases are likely to create 
frequent pause events in the replication process. 

In addition to frequent replication pauses at the 5' and 
3' ends, Koike and Wolstenholme [36] point out that 
mtDNA replication is a discontinuous process. Based on 
the observation of different molecular forms of mtDNA 
in electron micrographs, they found that at least 44% of 
the mt genome contains discrete positions at which 
DNA synthesis tends to be arrested. By using a combin- 
ation of one- and two-dimensional agarose gel electro- 
phoresis and solution hybridization to strand-specific 



probes, Mayhook et al. [37] proved that replication 
pauses exist at ATP6/COIII gene positions. These mt- 
replication pauses could occur not only at the CR, but 
also at other positions including protein coding 
sequences, tRNAs, and rRNAs. These pause events 
could trigger the PMM process, and lead to the forma- 
tion and motif indel of mtTRs. 

Dynamic competition and melting 

Buroker [18] reports that competitive equilibrium and 
melting occurs between the N-strand and D-strand at 
the TAS domains that could be followed by misalign- 
ment in repeated sequence in the white sturgeon mt 
genome. Because replication pause events exist at many 
sites during mt-replication, so too could the dynamic 
competition, double strand melting, and subsequent 
mispairing occur at multiple locations. These processes 
would provide the opportunity for the formation of re- 
petitive motifs (Figure 4a,b,c). While mt-replication 
proceeds around the circular mt genome, the proportion 
of the D-strand binding to the T-strand gradually 
decreases from almost-entirely double stranded, to be 
completely displaced at the mt-replication termini 
(Figure 5a,b). Compared with the N-strand, the 
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Figure 5 The melting event at the 3' end in the CR. a. the D-strand is completely displaced at the 3' end; b. the competitive state of the 
D-strand and the N-strand at 3' end. 



competitive strength of D-strand rebinding to the 
T-strand should therefore become much reduced, which 
would lead to a shorter melting length of the N-strand. 
This may be one of the reasons that the motif length at 
the 3' end of the CR is generally shorter (Figure 3b,c). 

Folding and mispairing 

Due to the competition of the D-strand, a portion of the 
N-strand is displaced from the T-strand, thus creating a 
short single-stranded status for both the N-strand and 
T-strand. These regions may then fold to form a stable 
intra-molecular secondary structure (Figure 4c), produ- 
cing the shortened single strand of both strands, and 
then the reannealing of these regions could lead to 
mispairing. Induced stable secondary structures are 
thought to be a leading reason for mispairing during 
mtDNA replication [18,19], however, in our vertebrate 
mtTR data, the lengths of many motifs are 6, 8 bp or 
even 2-5 bp. Motifs of these lengths are not known to 
form stable secondary structures. Even if some motifs 
are known to form hairpin structures, such hairpins are 
usually unstable due to their high free energy (exceeding 
the criterion value, -0.6 kcal/mol). Thus, secondary 
structure formation does not appear to be necessary for 
the TR indel events. 

The inferred process for short motif mispairing and 
folding is as follows: The 3' end of the N-strand mispairs 
with complementary sequences without any secondary 
structures formed (Figure 6a). The N-strand then grad- 
ually rebinds backwards with complementary bases until 
the unpaired bases are forced to form a hairpin, loop, or 
bulge, when the unpaired bases are longer than 8, from 
6 to 8, or shorter than 6, respectively (Figure 6b,c)[38]. 
The bulge could slip backwards or forwards on the tem- 
plate strand in the labile state due to steric crowding of 
thymidine. If the bulge slips to the end of N-strand, it 
would be released (Figure 6c). The phenomenon of 
bulge slip may explain why motifs shorter than 6 bp 
(particularly 2-4 bp) are rarer than longer ones 
(Figure 3a). Another explanation may be that 2-4 
nucleotides motifs cannot fold to any secondary 



structure, so the melted strands cannot be shortened, 
and the incidence of mispairing is decreased. 

The generation of M-TR motifs in P. cornutus 

The mtTRs in P. cornutus are characterized by the pres- 
ence of three motif types (3', M, 5'-TR). Two types 
(3 ; , 5'-TR) were found in all 20 specimens, but the 
M-TR was only observed in DS-03 and QD-07 (Table 1). 
In three clones of DS-03, 58 of the 65 bases section is 
copied once and forms a new repeated tract; in 24 clones 
of QD-07, 48 of the 65 bases section is copied four 
(16 clones) or five times (8 clones). In the other 18 
individuals, however, the same 65 bp section is not 
repeated (Table 1). These repeat tracts are indicative of 
the generation of new mtTRs. The new motifs share a 
feature in which the first several bases of the repeated 
motif are the same as the bases after the repeated motifs 
(bold sequences in Table 1). We suggest that the PMM 
model could be the mechanism by which the M-TR is 
formed (Figure 7). Specifically, after the melting of the 
N-strand due to dynamic competition, the 3' end 
sequences of the N- strand inaccurately rebind (mispair) 
to complementary sequences of the T-strand. In the next 
round of mt-replication, a new repeat would be 
generated (Figure 7). 

The PMM model can readily explain key features of 
mtTRs in vertebrates 

The PMM model differs from existing models of mtTR 
generation in several key aspects. Unlike the SSM model, 
the PMM model is specifically proposed for the origin- 
ation and indel of motifs in mtTRs. It emphasizes the 
pause events as the trigger, followed by the melting of 
the N-strand from the T-strand, and subsequent folding 
and the mispairing. In contrast to the IEM and IIM 
models, the PMM model can explain the formation and 
indel of mtTRs in any locations of the mt genome. Fur- 
thermore, the PMM model can explain the origin of 
many characteristics of mtTRs in vertebrate mt genomes 
that the other models cannot. 
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In summary, the following characteristics of mtTRs 
can be explained by the PMM model: (i) The formation 
and motif indels of mtTR at any location in the mt gen- 
ome can be produced by PMM mechanisms, even those 
outside the CR. Previous models do not address the pro- 
duction of indels at these locations, (ii) The concentra- 
tion of large mtTRs at the 3' and 5' ends in the CRs is 
explained by the PMM model Because there are stable 
pause events at both ends of the CR during mt- 
replication and the selection pressure upon the control 
region is less than that on other functional regions [1], 
additional TRs can be retained at these locations, (iii) 
Shorter motif lengths at the 3' end could be due to 
reduced competition between the N-strand and the 
T-strand compared to that at the 5' CR end. (iv) The 
generation of the M-TR motif in P. cornutus provides 
evidence for the origin of mtTRs by PMM mechanisms, 
(v) The PMM model can explain the hypervariable 
features of 3' -TRs in P. cornutus. Because mt-replication 
frequently pauses at the 3 f end of the CR, PMM would 
be triggered to generate a new copy or motif indels in 
existing mtTRs. The newborn motifs would subse- 
quently increase the incidence of mispairing, which in 
turn would increase the frequency of motif indels. Fur- 
thermore, through the PMM model, more than one 
motif could be inserted or deleted in a single event, 
hence the generation tempo of motif indels greatly 
increases. 

Conclusions 

MtTRs of P. cornutus exhibit hypervariable characteristics, 
thus the 3'-TR-v motifs could act as molecular markers 
for the exploration of the newly generated of mtTRs as 
well as future indels. Based on analyses of the mtTRs in 
P. cornutus and vertebrate mt genomes, we developed a 
novel model of Pause Melting Misalignment for the for- 
mation and motif indel of tandem repeats in the mt ge- 
nome. The model can explain the characters of TRs in 
available mt genomes, and we hope this model will pro- 
vide an important basis for the explanation of repeated 
region evolution in mt genomes. 
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